e 7.20(a) shows the Lasso discrimination model constructed for
ating between the USA sequences and the India sequences. In
rimination model, the top five words which had the best
ation power were TAC, AGC, ATG, CAT and GTC for
ating the genomic patterns between these two countries. Figure
shows the Lasso discrimination model constructed for
ating between the USA sequences and the Russia sequences. In
el, the top five words were ATA, AGC, GAA, AAC and TAT.
(a) (b)
(a) The Lasso discrimination model constructed for discriminating the USA
against the India sequences. (b) The Lasso discrimination model constructed for
ing the USA sequences against the Russia sequences.
e 7.21 shows a hierarchical cluster generated using the kmer
based on the 3-mer word library for randomly selected five USA
s and five India sequences. It can be seen that the USA and India
s generally formed two distinct clusters.
The hierarchical cluster generated by the kmer package for randomly selected
ces from USA and five sequences from India.